Draft of article based on discussions about TCP Info data and caveats analyzing it by jduckles · Pull Request #9 · m-lab/knowledgebase

jduckles · 2026-07-01T01:13:30Z

Hey @sermpezis and @robertodauria could you please review and edit this as you see fit. I pulled it together from all the discussion, document, slack context using the new kb article Claude skill in this repo inside of .claude/skills/mlab-kb-article.

… about analyzing it

robertodauria

Thanks! I've added some comments — see below.

robertodauria · 2026-07-02T15:24:14Z

+
+<!-- TODO: Add direct link to Pavlos' TCPinfo Colab notebook once it has a stable public URL. -->
+<!-- TODO: Add section on unnesting the raw.Snapshots array in BigQuery for within-connection time series analysis. -->
+<!-- FIXME: Verify that the RTT/RTTVar fields cited above match the current ndt.tcpinfo schema exactly — column paths may differ between the ndt.tcpinfo view and raw tables. -->


I would expect the verification to happen before the KB article is posted. Could you please confirm that the TCPInfo schema matches?

robertodauria · 2026-07-02T15:25:51Z

+
+Files are stored in `.zst`-compressed JSONL format. Pavlos Sermpezis has a [Colab notebook](https://colab.research.google.com/) for snapshot-level analysis — ask on the M-Lab Discuss list or Slack for the current link.
+
+<!-- TODO: Add direct link to Pavlos' TCPinfo Colab notebook once it has a stable public URL. -->


TODOs in code comments aren't very visible — I'd rather wait until we have a public link to add here (if posting this isn't urgent), or create an issue/a CU task to document what is missing before merging this PR, perhaps assigning the person this is blocked on.

Also, AFAIK M-Lab's Slack isn't exactly "public" the same way the Discuss list is, it's on invitation.

robertodauria · 2026-07-02T15:26:45Z

+Files are stored in `.zst`-compressed JSONL format. Pavlos Sermpezis has a [Colab notebook](https://colab.research.google.com/) for snapshot-level analysis — ask on the M-Lab Discuss list or Slack for the current link.
+
+<!-- TODO: Add direct link to Pavlos' TCPinfo Colab notebook once it has a stable public URL. -->
+<!-- TODO: Add section on unnesting the raw.Snapshots array in BigQuery for within-connection time series analysis. -->


Same: either add the section as part of this PR, or create an issue instead of a TODO in a comment.

(this applies to every other TODO in this file)

robertodauria · 2026-07-02T15:32:37Z

+ORDER BY num_snapshots
+```
+
+Comparing the two outputs makes the noise problem concrete: the first query will show a large fraction of 1–2 snapshot rows; the second (UUID-joined) query will show a clean distribution concentrated at 40–100 snapshots.


Since we're inviting a comparison here, I think it would be helpful if the two queries used the same date.

They also LIMIT 10000 in the inner query with no ORDER BY, which I believe makes the output non-deterministic. They then use this sample to compute a percentage, which would be non-deterministic as well.

robertodauria · 2026-07-02T15:38:59Z

+gs://archive-measurement-lab/ndt/tcpinfo/YYYY/MM/DD/
+```
+
+Files are stored in `.zst`-compressed JSONL format. Pavlos Sermpezis has a [Colab notebook](https://colab.research.google.com/) for snapshot-level analysis — ask on the M-Lab Discuss list or Slack for the current link.


Files are stored in .zst-compressed JSONL format

This is correct but omits the tarball layer: users will find .tgz archives containing per-connection .jsonl.zst files.

Co-authored-by: Roberto D'Auria <roberto@measurementlab.net>

Draft of article based on discussions about TCP Info data and caveats…

f277076

… about analyzing it

jduckles requested review from robertodauria and sermpezis July 1, 2026 01:14

jduckles self-assigned this Jul 1, 2026

jduckles added the documentation Improvements or additions to documentation label Jul 1, 2026

robertodauria requested changes Jul 2, 2026

View reviewed changes

jduckles and others added 3 commits July 3, 2026 10:01

Update src/content/articles/tcpinfo-snapshot-analysis.md

0d1bdbb

Co-authored-by: Roberto D'Auria <roberto@measurementlab.net>

Update src/content/articles/tcpinfo-snapshot-analysis.md

f4a288e

Co-authored-by: Roberto D'Auria <roberto@measurementlab.net>

Update src/content/articles/tcpinfo-snapshot-analysis.md

5871db1

Co-authored-by: Roberto D'Auria <roberto@measurementlab.net>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Draft of article based on discussions about TCP Info data and caveats analyzing it#9

Draft of article based on discussions about TCP Info data and caveats analyzing it#9
jduckles wants to merge 4 commits into
mainfrom
newarticle/tcpinfo-snapshot-analysis

jduckles commented Jul 1, 2026

Uh oh!

robertodauria left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robertodauria Jul 2, 2026

Uh oh!

robertodauria Jul 2, 2026

Uh oh!

robertodauria Jul 2, 2026

Uh oh!

robertodauria Jul 2, 2026

Uh oh!

Uh oh!

robertodauria Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		Files are stored in `.zst`-compressed JSONL format. Pavlos Sermpezis has a [Colab notebook](https://colab.research.google.com/) for snapshot-level analysis — ask on the M-Lab Discuss list or Slack for the current link.

		<!-- TODO: Add direct link to Pavlos' TCPinfo Colab notebook once it has a stable public URL. -->

Uh oh!

Conversation

jduckles commented Jul 1, 2026

Uh oh!

robertodauria left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robertodauria Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

robertodauria Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

robertodauria Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

robertodauria Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

robertodauria Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants